1. Field of the Invention
This invention relates to the playback of speech recordings. More particularly, this invention relates to computerized methods and systems for suppressing pauses during the playback of audio recordings for transcription. The methods and systems are implemented in computer hardware and software, and in speech playback equipment controlled by computer hardware and software. This invention is related to a co-pending application, filed on even date herewith, entitled "A Method and System for Performing Text Edits During Audio Recording Playback."
2. Background Information
Dictation and transcription of recorded speech is commonly used in a variety of professions, such as in the legal and medical fields. Transcription is typically done by human transcriptionists who listen to an audio recording of a dictation and type the recorded speech into a word processor. The playback of the audio recording is typically controlled by a foot pedal which allows the transcriptionist to start or stop the advance or the "rewind" of recorded speech without having to remove his or her hands from the keyboard. If the audio recording contains a large number of pauses, however, the productivity of the transcriptionist decreases, because he or she must wait through the pauses to be able to hear and identify the resumption of the dictation.
Pauses may be of two varieties: silent pauses or filled pauses. Throughout this specification the term "silent pauses" will be used to refer to pauses during the dictation with no speech whatsoever. Silent pauses may occur for a variety of reasons. The speaker, for example, may pause to think or may be interrupted while recording speech. The term "filled pause" will be used to refer to pauses during the dictation that the speaker fills with "words" such as "um" or "ah" that have no meaning for purposes of transcription. Both silent pauses and filled pauses in speech decrease the productivity of the transcriptionist by forcing that person to wait through the pauses for the resumption of transcribable speech.
Speech recognition systems are available that are capable of performing full word-level recognition of speech. Some of the available speech recognition systems are usable for transcription because they are capable of outputting spoken words as text that may be edited using typical word processing software. If a speech recognition system could perform perfect transcription, the output text would need little or no editing to appear as accurate transcribed text. However, even if the speech recognition system were nearly flawless, speech that is not meant to be part of the transcribed text, such as punctuation, paragraph markers, corrections or other instructions for a transcriptionist and phonemic transcriptions of filled pause sounds, may appear as the text output of the speech recognition system. Background speech, such as a conversation between the dictator and another person that is not meant to be recorded, may also become part of the transcribed speech. Therefore, even if a speech recognition system were nearly flawless, there will typically be problems with the transcribed text output.
Speech recognition systems may also have trouble producing quality results if a speaker has a strong accent or speaks with poor grammar. In many situations, therefore, a transcriptionist is needed to edit the text resulting from a speech recognition system to produce quality transcribed text. Such editing may require replay of all or portions of the original recording. In many of the "hard cases" where the speaker speaks poorly or there is a lot of background speech or noise in the recording, it may be easier for a transcriptionist to transcribe the recorded speech completely from scratch and without the aid of a "first draft" by a speech recognition system.
A method and system is needed to allow a transcriptionist to control the playback of speech so that silent and filled pauses will not substantially decrease the productivity of the transcriptionist by forcing that person to wait through the pauses before resuming transcription or editing of the dictation.